Overview

Dataset statistics

Number of variables40
Number of observations254
Missing cells708
Missing cells (%)7.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory369.9 KiB
Average record size in memory1.5 KiB

Variable types

Categorical27
DateTime3
Numeric8
Unsupported2

Warnings

TP_NOT has constant value "2" Constant
ID_AGRAVO has constant value "B54" Constant
NU_ANO has constant value "2017" Constant
SG_UF has constant value "33" Constant
ID_RG_RESI has constant value "" Constant
ID_PAIS has constant value "1" Constant
DEXAME has a high cardinality: 153 distinct values High cardinality
SG_UF_NOT is highly correlated with ID_MUNICIPHigh correlation
ID_MUNICIP is highly correlated with SG_UF_NOTHigh correlation
SEM_PRI is highly correlated with NU_IDADE_NHigh correlation
NU_IDADE_N is highly correlated with SEM_PRIHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
COUFINF is highly correlated with CS_RACA and 9 other fieldsHigh correlation
PMM is highly correlated with AT_SINTOMA and 4 other fieldsHigh correlation
CS_RACA is highly correlated with COUFINF and 6 other fieldsHigh correlation
RESULT is highly correlated with COUFINF and 11 other fieldsHigh correlation
AT_SINTOMA is highly correlated with PMM and 4 other fieldsHigh correlation
ID_REGIONA is highly correlated with SG_UF_NOT and 2 other fieldsHigh correlation
SG_UF_NOT is highly correlated with ID_REGIONA and 2 other fieldsHigh correlation
SEM_NOT is highly correlated with CS_RACA and 1 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 14 other fieldsHigh correlation
AT_LAMINA is highly correlated with RESULT and 3 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with CS_RACA and 5 other fieldsHigh correlation
ID_MUNICIP is highly correlated with ID_REGIONA and 2 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with ID_REGIONA and 6 other fieldsHigh correlation
NU_IDADE_N is highly correlated with DTRATA and 3 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 10 other fieldsHigh correlation
CLASSI_FIN is highly correlated with COUFINF and 10 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 11 other fieldsHigh correlation
COPAISINF is highly correlated with PMM and 7 other fieldsHigh correlation
DSTRAESQUE is highly correlated with COUFINF and 7 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 8 other fieldsHigh correlation
CS_GESTANT is highly correlated with CS_SEXOHigh correlation
TRA_ESQUEM is highly correlated with COUFINF and 10 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with RESULT and 9 other fieldsHigh correlation
CS_SEXO is highly correlated with CS_GESTANTHigh correlation
PCRUZ is highly correlated with COUFINF and 10 other fieldsHigh correlation
ID_MN_RESI is highly correlated with PMM and 4 other fieldsHigh correlation
COUFINF is highly correlated with DTRATA and 9 other fieldsHigh correlation
ID_REGIONA is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 14 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
DSTRAESQUE is highly correlated with DTRATA and 6 other fieldsHigh correlation
ID_PAIS is highly correlated with COUFINF and 24 other fieldsHigh correlation
NU_ANO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_SEXO is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 8 other fieldsHigh correlation
SG_UF is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_RACA is highly correlated with ID_PAIS and 5 other fieldsHigh correlation
RESULT is highly correlated with COUFINF and 14 other fieldsHigh correlation
AT_SINTOMA is highly correlated with ID_PAIS and 9 other fieldsHigh correlation
SG_UF_NOT is highly correlated with ID_REGIONA and 6 other fieldsHigh correlation
TP_NOT is highly correlated with COUFINF and 24 other fieldsHigh correlation
AT_LAMINA is highly correlated with ID_PAIS and 9 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 9 other fieldsHigh correlation
TPAUTOCTO is highly correlated with DTRATA and 11 other fieldsHigh correlation
ID_AGRAVO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_GESTANT is highly correlated with ID_PAIS and 6 other fieldsHigh correlation
ID_RG_RESI is highly correlated with COUFINF and 24 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with DTRATA and 10 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with ID_PAIS and 9 other fieldsHigh correlation
CLASSI_FIN is highly correlated with DTRATA and 13 other fieldsHigh correlation
PCRUZ is highly correlated with DTRATA and 10 other fieldsHigh correlation
DT_INVEST has 254 (100.0%) missing values Missing
PMM has 200 (78.7%) missing values Missing
DT_ENCERRA has 254 (100.0%) missing values Missing
DT_INVEST is an unsupported type, check if it needs cleaning or further analysis Unsupported
DT_ENCERRA is an unsupported type, check if it needs cleaning or further analysis Unsupported
COPAISINF has 197 (77.6%) zeros Zeros

Reproduction

Analysis started2021-07-06 18:45:13.074285
Analysis finished2021-07-06 18:45:33.534277
Duration20.46 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

TP_NOT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size14.5 KiB
2
254 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters254
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2254
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2254
100.0%

Most occurring characters

ValueCountFrequency (%)
2254
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number254
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2254
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common254
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2254
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII254
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2254
100.0%

ID_AGRAVO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size19.0 KiB
B54
254 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters762
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB54
2nd rowB54
3rd rowB54
4th rowB54
5th rowB54

Common Values

ValueCountFrequency (%)
B54254
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
b54254
100.0%

Most occurring characters

ValueCountFrequency (%)
B254
33.3%
5254
33.3%
4254
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number508
66.7%
Uppercase Letter254
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5254
50.0%
4254
50.0%
Uppercase Letter
ValueCountFrequency (%)
B254
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common508
66.7%
Latin254
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5254
50.0%
4254
50.0%
Latin
ValueCountFrequency (%)
B254
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII762
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B254
33.3%
5254
33.3%
4254
33.3%
Distinct153
Distinct (%)60.2%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
Minimum2017-01-05 00:00:00
Maximum2017-12-31 00:00:00
Histogram with fixed size bins (bins=50)

SEM_NOT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct51
Distinct (%)20.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201726.1378
Minimum201701
Maximum201801
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB

Quantile statistics

Minimum201701
5-th percentile201703
Q1201711.25
median201727
Q3201739
95-th percentile201751
Maximum201801
Range100
Interquartile range (IQR)27.75

Descriptive statistics

Standard deviation16.67738252
Coefficient of variation (CV)8.267338432 × 10-5
Kurtosis0.02193457029
Mean201726.1378
Median Absolute Deviation (MAD)14
Skewness0.3518709587
Sum51238439
Variance278.1350876
MonotonicityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20173213
 
5.1%
20174810
 
3.9%
2017149
 
3.5%
2017049
 
3.5%
2017278
 
3.1%
2017058
 
3.1%
2017398
 
3.1%
2017078
 
3.1%
2017068
 
3.1%
2017027
 
2.8%
Other values (41)166
65.4%
ValueCountFrequency (%)
2017012
 
0.8%
2017027
2.8%
2017037
2.8%
2017049
3.5%
2017058
3.1%
2017068
3.1%
2017078
3.1%
2017084
1.6%
2017092
 
0.8%
2017106
2.4%
ValueCountFrequency (%)
2018011
 
0.4%
2017527
2.8%
2017516
2.4%
2017504
 
1.6%
2017493
 
1.2%
20174810
3.9%
2017475
2.0%
2017465
2.0%
2017452
 
0.8%
2017446
2.4%

NU_ANO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
2017
254 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1016
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2017
2nd row2017
3rd row2017
4th row2017
5th row2017

Common Values

ValueCountFrequency (%)
2017254
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2017254
100.0%

Most occurring characters

ValueCountFrequency (%)
2254
25.0%
0254
25.0%
1254
25.0%
7254
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1016
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2254
25.0%
0254
25.0%
1254
25.0%
7254
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common1016
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2254
25.0%
0254
25.0%
1254
25.0%
7254
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1016
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2254
25.0%
0254
25.0%
1254
25.0%
7254
25.0%

SG_UF_NOT
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size14.8 KiB
33
252 
31
 
1
32
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters508
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.8%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33252
99.2%
311
 
0.4%
321
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33252
99.2%
321
 
0.4%
311
 
0.4%

Most occurring characters

ValueCountFrequency (%)
3506
99.6%
11
 
0.2%
21
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number508
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3506
99.6%
11
 
0.2%
21
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common508
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3506
99.6%
11
 
0.2%
21
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII508
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3506
99.6%
11
 
0.2%
21
 
0.2%

ID_MUNICIP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct16
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330317.878
Minimum310620
Maximum330630
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB

Quantile statistics

Minimum310620
5-th percentile330240
Q1330455
median330455
Q3330455
95-th percentile330455
Maximum330630
Range20010
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1389.313515
Coefficient of variation (CV)0.004205989467
Kurtosis170.4870624
Mean330317.878
Median Absolute Deviation (MAD)0
Skewness-12.74681902
Sum83900741
Variance1930192.044
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
330455213
83.9%
3303409
 
3.5%
3302409
 
3.5%
3303907
 
2.8%
3303304
 
1.6%
3306302
 
0.8%
3301851
 
0.4%
3304301
 
0.4%
3304201
 
0.4%
3304111
 
0.4%
Other values (6)6
 
2.4%
ValueCountFrequency (%)
3106201
 
0.4%
3205301
 
0.4%
3300101
 
0.4%
3300201
 
0.4%
3301851
 
0.4%
3302409
3.5%
3303201
 
0.4%
3303304
1.6%
3303409
3.5%
3303501
 
0.4%
ValueCountFrequency (%)
3306302
 
0.8%
330455213
83.9%
3304301
 
0.4%
3304201
 
0.4%
3304111
 
0.4%
3303907
 
2.8%
3303501
 
0.4%
3303409
 
3.5%
3303304
 
1.6%
3303201
 
0.4%

ID_REGIONA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
252 
1449
 
1
1510
 
1

Length

Max length4
Median length0
Mean length0.03149606299
Min length0

Characters and Unicode

Total characters8
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.8%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
252
99.2%
14491
 
0.4%
15101
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
14491
50.0%
15101
50.0%

Most occurring characters

ValueCountFrequency (%)
13
37.5%
42
25.0%
91
 
12.5%
51
 
12.5%
01
 
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
13
37.5%
42
25.0%
91
 
12.5%
51
 
12.5%
01
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
Common8
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
13
37.5%
42
25.0%
91
 
12.5%
51
 
12.5%
01
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII8
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13
37.5%
42
25.0%
91
 
12.5%
51
 
12.5%
01
 
12.5%

ID_UNIDADE
Real number (ℝ≥0)

Distinct54
Distinct (%)21.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2945032.039
Minimum69
Maximum7859341
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB

Quantile statistics

Minimum69
5-th percentile2269988
Q12288338
median2288338
Q32298090
95-th percentile7431063.8
Maximum7859341
Range7859272
Interquartile range (IQR)9752

Descriptive statistics

Standard deviation1670552.125
Coefficient of variation (CV)0.5672441258
Kurtosis2.691758054
Mean2945032.039
Median Absolute Deviation (MAD)0
Skewness1.847697713
Sum748038138
Variance2.790744401 × 1012
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2288338152
59.8%
77404769
 
3.5%
54628868
 
3.1%
22755897
 
2.8%
33754717
 
2.8%
30059927
 
2.8%
22699884
 
1.6%
76424233
 
1.2%
22702773
 
1.2%
30349843
 
1.2%
Other values (44)51
 
20.1%
ValueCountFrequency (%)
691
 
0.4%
117381
 
0.4%
125052
0.8%
127341
 
0.4%
260501
 
0.4%
269481
 
0.4%
22696511
 
0.4%
22697832
0.8%
22699884
1.6%
22702773
1.2%
ValueCountFrequency (%)
78593411
 
0.4%
77404769
3.5%
76424233
 
1.2%
73172552
 
0.8%
72514911
 
0.4%
72499421
 
0.4%
71101621
 
0.4%
69381241
 
0.4%
68999191
 
0.4%
67534691
 
0.4%
Distinct163
Distinct (%)64.2%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
Minimum1984-03-03 00:00:00
Maximum2017-12-31 00:00:00
Histogram with fixed size bins (bins=50)

SEM_PRI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct57
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201709.2008
Minimum198409
Maximum201801
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB

Quantile statistics

Minimum198409
5-th percentile201701
Q1201709
median201726
Q3201738
95-th percentile201750
Maximum201801
Range3392
Interquartile range (IQR)29

Descriptive statistics

Standard deviation209.1504249
Coefficient of variation (CV)0.001036890851
Kurtosis247.8679749
Mean201709.2008
Median Absolute Deviation (MAD)15
Skewness-15.65099953
Sum51234137
Variance43743.90024
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20170314
 
5.5%
20173214
 
5.5%
20170511
 
4.3%
2017439
 
3.5%
2017019
 
3.5%
2017139
 
3.5%
2017029
 
3.5%
2017268
 
3.1%
2017487
 
2.8%
2017307
 
2.8%
Other values (47)157
61.8%
ValueCountFrequency (%)
1984091
 
0.4%
2016041
 
0.4%
2016141
 
0.4%
2016492
 
0.8%
2016514
 
1.6%
2016521
 
0.4%
2017019
3.5%
2017029
3.5%
20170314
5.5%
2017042
 
0.8%
ValueCountFrequency (%)
2018011
 
0.4%
2017523
1.2%
2017514
1.6%
2017506
2.4%
2017494
1.6%
2017487
2.8%
2017472
 
0.8%
2017467
2.8%
2017454
1.6%
2017441
 
0.4%
Distinct200
Distinct (%)78.7%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
Minimum1923-06-02 00:00:00
Maximum2017-03-14 00:00:00
Histogram with fixed size bins (bins=50)

NU_IDADE_N
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct70
Distinct (%)27.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4017.397638
Minimum2000
Maximum4093
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB

Quantile statistics

Minimum2000
5-th percentile4014
Q14030
median4039.5
Q34052
95-th percentile4071
Maximum4093
Range2093
Interquartile range (IQR)22

Descriptive statistics

Standard deviation202.2427497
Coefficient of variation (CV)0.05034173062
Kurtosis81.46279357
Mean4017.397638
Median Absolute Deviation (MAD)10.5
Skewness-8.84233966
Sum1020419
Variance40902.1298
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
403313
 
5.1%
404911
 
4.3%
402910
 
3.9%
40349
 
3.5%
40549
 
3.5%
40429
 
3.5%
40288
 
3.1%
40327
 
2.8%
40477
 
2.8%
40317
 
2.8%
Other values (60)164
64.6%
ValueCountFrequency (%)
20001
0.4%
20081
0.4%
30102
0.8%
40011
0.4%
40061
0.4%
40082
0.8%
40102
0.8%
40111
0.4%
40131
0.4%
40142
0.8%
ValueCountFrequency (%)
40931
 
0.4%
40861
 
0.4%
40791
 
0.4%
40772
0.8%
40743
1.2%
40731
 
0.4%
40721
 
0.4%
40714
1.6%
40702
0.8%
40691
 
0.4%

CS_SEXO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
M
181 
F
73 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters254
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowF
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M181
71.3%
F73
28.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m181
71.3%
f73
28.7%

Most occurring characters

ValueCountFrequency (%)
M181
71.3%
F73
28.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter254
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M181
71.3%
F73
28.7%

Most occurring scripts

ValueCountFrequency (%)
Latin254
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M181
71.3%
F73
28.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII254
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M181
71.3%
F73
28.7%

CS_GESTANT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size14.5 KiB
6
198 
5
49 
9
 
7

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters254
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row6
2nd row5
3rd row5
4th row6
5th row6

Common Values

ValueCountFrequency (%)
6198
78.0%
549
 
19.3%
97
 
2.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
6198
78.0%
549
 
19.3%
97
 
2.8%

Most occurring characters

ValueCountFrequency (%)
6198
78.0%
549
 
19.3%
97
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number254
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6198
78.0%
549
 
19.3%
97
 
2.8%

Most occurring scripts

ValueCountFrequency (%)
Common254
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6198
78.0%
549
 
19.3%
97
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII254
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6198
78.0%
549
 
19.3%
97
 
2.8%

CS_RACA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
1
78 
9
77 
4
61 
2
31 
 
4
Other values (2)
 
3

Length

Max length1
Median length1
Mean length0.9842519685
Min length0

Characters and Unicode

Total characters250
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row1
2nd row1
3rd row9
4th row4
5th row4

Common Values

ValueCountFrequency (%)
178
30.7%
977
30.3%
461
24.0%
231
 
12.2%
4
 
1.6%
32
 
0.8%
51
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
178
31.2%
977
30.8%
461
24.4%
231
 
12.4%
32
 
0.8%
51
 
0.4%

Most occurring characters

ValueCountFrequency (%)
178
31.2%
977
30.8%
461
24.4%
231
 
12.4%
32
 
0.8%
51
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number250
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
178
31.2%
977
30.8%
461
24.4%
231
 
12.4%
32
 
0.8%
51
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common250
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
178
31.2%
977
30.8%
461
24.4%
231
 
12.4%
32
 
0.8%
51
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII250
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
178
31.2%
977
30.8%
461
24.4%
231
 
12.4%
32
 
0.8%
51
 
0.4%

CS_ESCOL_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size14.8 KiB
08
116 
09
38 
06
31 
07
21 
13 
Other values (6)
35 

Length

Max length2
Median length2
Mean length1.897637795
Min length0

Characters and Unicode

Total characters482
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row08
2nd row08
3rd row08
4th row08
5th row05

Common Values

ValueCountFrequency (%)
08116
45.7%
0938
 
15.0%
0631
 
12.2%
0721
 
8.3%
13
 
5.1%
0510
 
3.9%
0410
 
3.9%
106
 
2.4%
035
 
2.0%
013
 
1.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
08116
48.1%
0938
 
15.8%
0631
 
12.9%
0721
 
8.7%
0510
 
4.1%
0410
 
4.1%
106
 
2.5%
035
 
2.1%
013
 
1.2%
021
 
0.4%

Most occurring characters

ValueCountFrequency (%)
0241
50.0%
8116
24.1%
938
 
7.9%
631
 
6.4%
721
 
4.4%
510
 
2.1%
410
 
2.1%
19
 
1.9%
35
 
1.0%
21
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number482
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0241
50.0%
8116
24.1%
938
 
7.9%
631
 
6.4%
721
 
4.4%
510
 
2.1%
410
 
2.1%
19
 
1.9%
35
 
1.0%
21
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common482
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0241
50.0%
8116
24.1%
938
 
7.9%
631
 
6.4%
721
 
4.4%
510
 
2.1%
410
 
2.1%
19
 
1.9%
35
 
1.0%
21
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII482
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0241
50.0%
8116
24.1%
938
 
7.9%
631
 
6.4%
721
 
4.4%
510
 
2.1%
410
 
2.1%
19
 
1.9%
35
 
1.0%
21
 
0.2%

SG_UF
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size14.8 KiB
33
254 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters508
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33254
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33254
100.0%

Most occurring characters

ValueCountFrequency (%)
3508
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number508
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3508
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common508
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3508
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII508
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3508
100.0%

ID_MN_RESI
Real number (ℝ≥0)

HIGH CORRELATION

Distinct28
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330392.4134
Minimum330010
Maximum330630
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB

Quantile statistics

Minimum330010
5-th percentile330020
Q1330390
median330455
Q3330455
95-th percentile330455
Maximum330630
Range620
Interquartile range (IQR)65

Descriptive statistics

Standard deviation132.0325687
Coefficient of variation (CV)0.0003996234881
Kurtosis2.42049498
Mean330392.4134
Median Absolute Deviation (MAD)0
Skewness-1.872729993
Sum83919673
Variance17432.59919
MonotonicityNot monotonic
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
330455172
67.7%
33002014
 
5.5%
33034010
 
3.9%
3303308
 
3.1%
3303907
 
2.8%
3301707
 
2.8%
3302406
 
2.4%
3304905
 
2.0%
3303503
 
1.2%
3300802
 
0.8%
Other values (18)20
 
7.9%
ValueCountFrequency (%)
3300101
 
0.4%
33002014
5.5%
3300401
 
0.4%
3300451
 
0.4%
3300501
 
0.4%
3300701
 
0.4%
3300802
 
0.8%
3301201
 
0.4%
3301302
 
0.8%
3301707
2.8%
ValueCountFrequency (%)
3306301
 
0.4%
3306001
 
0.4%
3305801
 
0.4%
3305101
 
0.4%
3304905
 
2.0%
330455172
67.7%
3304521
 
0.4%
3304501
 
0.4%
3304301
 
0.4%
3304141
 
0.4%

ID_RG_RESI
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
254 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
254
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_PAIS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size14.5 KiB
1
254 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters254
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1254
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1254
100.0%

Most occurring characters

ValueCountFrequency (%)
1254
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number254
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1254
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common254
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1254
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII254
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1254
100.0%

DT_INVEST
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing254
Missing (%)100.0%
Memory size2.1 KiB

ID_OCUPA_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct49
Distinct (%)19.3%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
152 
999991
26 
263110
 
6
999993
 
5
999992
 
4
Other values (44)
61 

Length

Max length6
Median length0
Mean length2.409448819
Min length0

Characters and Unicode

Total characters612
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)12.2%

Sample

1st row223605
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
152
59.8%
99999126
 
10.2%
2631106
 
2.4%
9999935
 
2.0%
9999924
 
1.6%
9999944
 
1.6%
7114053
 
1.2%
2211053
 
1.2%
9989992
 
0.8%
2231152
 
0.8%
Other values (39)47
 
18.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
99999126
25.5%
2631106
 
5.9%
9999935
 
4.9%
9999924
 
3.9%
9999944
 
3.9%
2211053
 
2.9%
7114053
 
2.9%
2231152
 
2.0%
7825052
 
2.0%
9989992
 
2.0%
Other values (38)45
44.1%

Most occurring characters

ValueCountFrequency (%)
9207
33.8%
1108
17.6%
292
15.0%
058
 
9.5%
553
 
8.7%
341
 
6.7%
422
 
3.6%
612
 
2.0%
712
 
2.0%
87
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number612
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9207
33.8%
1108
17.6%
292
15.0%
058
 
9.5%
553
 
8.7%
341
 
6.7%
422
 
3.6%
612
 
2.0%
712
 
2.0%
87
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common612
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
9207
33.8%
1108
17.6%
292
15.0%
058
 
9.5%
553
 
8.7%
341
 
6.7%
422
 
3.6%
612
 
2.0%
712
 
2.0%
87
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII612
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9207
33.8%
1108
17.6%
292
15.0%
058
 
9.5%
553
 
8.7%
341
 
6.7%
422
 
3.6%
612
 
2.0%
712
 
2.0%
87
 
1.1%

CLASSI_FIN
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
2
190 
1
62 
 
2

Length

Max length1
Median length1
Mean length0.9921259843
Min length0

Characters and Unicode

Total characters252
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
2190
74.8%
162
 
24.4%
2
 
0.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2190
75.4%
162
 
24.6%

Most occurring characters

ValueCountFrequency (%)
2190
75.4%
162
 
24.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number252
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2190
75.4%
162
 
24.6%

Most occurring scripts

ValueCountFrequency (%)
Common252
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2190
75.4%
162
 
24.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII252
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2190
75.4%
162
 
24.6%

AT_ATIVIDA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size15.1 KiB
10
120 
11
68 
4
40 
99
 
7
1
 
4
Other values (6)
15 

Length

Max length2
Median length2
Mean length1.771653543
Min length0

Characters and Unicode

Total characters450
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row10
2nd row10
3rd row10
4th row10
5th row5

Common Values

ValueCountFrequency (%)
10120
47.2%
1168
26.8%
440
 
15.7%
997
 
2.8%
14
 
1.6%
54
 
1.6%
123
 
1.2%
33
 
1.2%
92
 
0.8%
2
 
0.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
10120
47.6%
1168
27.0%
440
 
15.9%
997
 
2.8%
14
 
1.6%
54
 
1.6%
123
 
1.2%
33
 
1.2%
92
 
0.8%
71
 
0.4%

Most occurring characters

ValueCountFrequency (%)
1263
58.4%
0120
26.7%
440
 
8.9%
916
 
3.6%
54
 
0.9%
33
 
0.7%
23
 
0.7%
71
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number450
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1263
58.4%
0120
26.7%
440
 
8.9%
916
 
3.6%
54
 
0.9%
33
 
0.7%
23
 
0.7%
71
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common450
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1263
58.4%
0120
26.7%
440
 
8.9%
916
 
3.6%
54
 
0.9%
33
 
0.7%
23
 
0.7%
71
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII450
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1263
58.4%
0120
26.7%
440
 
8.9%
916
 
3.6%
54
 
0.9%
33
 
0.7%
23
 
0.7%
71
 
0.2%

AT_LAMINA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
1
148 
2
55 
3
49 
 
2

Length

Max length1
Median length1
Mean length0.9921259843
Min length0

Characters and Unicode

Total characters252
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1148
58.3%
255
 
21.7%
349
 
19.3%
2
 
0.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1148
58.7%
255
 
21.8%
349
 
19.4%

Most occurring characters

ValueCountFrequency (%)
1148
58.7%
255
 
21.8%
349
 
19.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number252
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1148
58.7%
255
 
21.8%
349
 
19.4%

Most occurring scripts

ValueCountFrequency (%)
Common252
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1148
58.7%
255
 
21.8%
349
 
19.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII252
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1148
58.7%
255
 
21.8%
349
 
19.4%

AT_SINTOMA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
1
204 
2
48 
 
2

Length

Max length1
Median length1
Mean length0.9921259843
Min length0

Characters and Unicode

Total characters252
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1204
80.3%
248
 
18.9%
2
 
0.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1204
81.0%
248
 
19.0%

Most occurring characters

ValueCountFrequency (%)
1204
81.0%
248
 
19.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number252
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1204
81.0%
248
 
19.0%

Most occurring scripts

ValueCountFrequency (%)
Common252
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1204
81.0%
248
 
19.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII252
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1204
81.0%
248
 
19.0%

TPAUTOCTO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
192 
2
50 
1
 
7
3
 
5

Length

Max length1
Median length0
Mean length0.2440944882
Min length0

Characters and Unicode

Total characters62
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row2

Common Values

ValueCountFrequency (%)
192
75.6%
250
 
19.7%
17
 
2.8%
35
 
2.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
250
80.6%
17
 
11.3%
35
 
8.1%

Most occurring characters

ValueCountFrequency (%)
250
80.6%
17
 
11.3%
35
 
8.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number62
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
250
80.6%
17
 
11.3%
35
 
8.1%

Most occurring scripts

ValueCountFrequency (%)
Common62
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
250
80.6%
17
 
11.3%
35
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII62
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
250
80.6%
17
 
11.3%
35
 
8.1%

COUFINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
217 
RJ
 
21
AM
 
11
RO
 
2
RR
 
1
Other values (2)
 
2

Length

Max length2
Median length0
Mean length0.2913385827
Min length0

Characters and Unicode

Total characters74
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.2%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
217
85.4%
RJ21
 
8.3%
AM11
 
4.3%
RO2
 
0.8%
RR1
 
0.4%
AC1
 
0.4%
AP1
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
rj21
56.8%
am11
29.7%
ro2
 
5.4%
rr1
 
2.7%
ac1
 
2.7%
ap1
 
2.7%

Most occurring characters

ValueCountFrequency (%)
R25
33.8%
J21
28.4%
A13
17.6%
M11
14.9%
O2
 
2.7%
C1
 
1.4%
P1
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter74
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R25
33.8%
J21
28.4%
A13
17.6%
M11
14.9%
O2
 
2.7%
C1
 
1.4%
P1
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Latin74
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R25
33.8%
J21
28.4%
A13
17.6%
M11
14.9%
O2
 
2.7%
C1
 
1.4%
P1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII74
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R25
33.8%
J21
28.4%
A13
17.6%
M11
14.9%
O2
 
2.7%
C1
 
1.4%
P1
 
1.4%

COPAISINF
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct9
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.728346457
Minimum0
Maximum190
Zeros197
Zeros (%)77.6%
Negative0
Negative (%)0.0%
Memory size2.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile110.05
Maximum190
Range190
Interquartile range (IQR)0

Descriptive statistics

Standard deviation36.625131
Coefficient of variation (CV)3.764784814
Kurtosis13.98912955
Mean9.728346457
Median Absolute Deviation (MAD)0
Skewness3.899908227
Sum2471
Variance1341.400221
MonotonicityNot monotonic
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
0197
77.6%
137
 
14.6%
1776
 
2.4%
315
 
2.0%
1493
 
1.2%
1122
 
0.8%
1092
 
0.8%
1901
 
0.4%
1381
 
0.4%
ValueCountFrequency (%)
0197
77.6%
137
 
14.6%
315
 
2.0%
1092
 
0.8%
1122
 
0.8%
1381
 
0.4%
1493
 
1.2%
1776
 
2.4%
1901
 
0.4%
ValueCountFrequency (%)
1901
 
0.4%
1776
 
2.4%
1493
 
1.2%
1381
 
0.4%
1122
 
0.8%
1092
 
0.8%
315
 
2.0%
137
 
14.6%
0197
77.6%

COMUNINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct18
Distinct (%)7.1%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
217 
330390
 
6
130260
 
5
130380
 
5
330185
 
3
Other values (13)
 
18

Length

Max length6
Median length0
Mean length0.874015748
Min length0

Characters and Unicode

Total characters222
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)3.9%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
217
85.4%
3303906
 
2.4%
1302605
 
2.0%
1303805
 
2.0%
3301853
 
1.2%
3302403
 
1.2%
3303403
 
1.2%
3300802
 
0.8%
3302901
 
0.4%
3300101
 
0.4%
Other values (8)8
 
3.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
3303906
16.2%
1302605
13.5%
1303805
13.5%
3301853
8.1%
3302403
8.1%
3303403
8.1%
3300802
 
5.4%
1100201
 
2.7%
1303531
 
2.7%
3305801
 
2.7%
Other values (7)7
18.9%

Most occurring characters

ValueCountFrequency (%)
075
33.8%
373
32.9%
124
 
10.8%
212
 
5.4%
811
 
5.0%
47
 
3.2%
57
 
3.2%
97
 
3.2%
66
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number222
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
075
33.8%
373
32.9%
124
 
10.8%
212
 
5.4%
811
 
5.0%
47
 
3.2%
57
 
3.2%
97
 
3.2%
66
 
2.7%

Most occurring scripts

ValueCountFrequency (%)
Common222
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
075
33.8%
373
32.9%
124
 
10.8%
212
 
5.4%
811
 
5.0%
47
 
3.2%
57
 
3.2%
97
 
3.2%
66
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII222
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
075
33.8%
373
32.9%
124
 
10.8%
212
 
5.4%
811
 
5.0%
47
 
3.2%
57
 
3.2%
97
 
3.2%
66
 
2.7%

LOC_INF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct18
Distinct (%)7.1%
Missing0
Missing (%)0.0%
Memory size15.3 KiB
233 
SANA
 
3
BURK
 
2
VALE
 
2
SAO
 
1
Other values (13)
 
13

Length

Max length4
Median length0
Mean length0.3267716535
Min length0

Characters and Unicode

Total characters83
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)5.5%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
233
91.7%
SANA3
 
1.2%
BURK2
 
0.8%
VALE2
 
0.8%
SAO1
 
0.4%
MUNI1
 
0.4%
ESTR1
 
0.4%
PARQ1
 
0.4%
CUPI1
 
0.4%
GARR1
 
0.4%
Other values (8)8
 
3.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
sana3
14.3%
burk2
 
9.5%
vale2
 
9.5%
nige1
 
4.8%
uaga1
 
4.8%
muni1
 
4.8%
cupi1
 
4.8%
siti1
 
4.8%
mala1
 
4.8%
sao1
 
4.8%
Other values (7)7
33.3%

Most occurring characters

ValueCountFrequency (%)
A18
21.7%
R8
9.6%
S7
 
8.4%
U6
 
7.2%
I6
 
7.2%
N6
 
7.2%
E5
 
6.0%
L4
 
4.8%
M4
 
4.8%
C3
 
3.6%
Other values (9)16
19.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter83
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A18
21.7%
R8
9.6%
S7
 
8.4%
U6
 
7.2%
I6
 
7.2%
N6
 
7.2%
E5
 
6.0%
L4
 
4.8%
M4
 
4.8%
C3
 
3.6%
Other values (9)16
19.3%

Most occurring scripts

ValueCountFrequency (%)
Latin83
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A18
21.7%
R8
9.6%
S7
 
8.4%
U6
 
7.2%
I6
 
7.2%
N6
 
7.2%
E5
 
6.0%
L4
 
4.8%
M4
 
4.8%
C3
 
3.6%
Other values (9)16
19.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII83
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A18
21.7%
R8
9.6%
S7
 
8.4%
U6
 
7.2%
I6
 
7.2%
N6
 
7.2%
E5
 
6.0%
L4
 
4.8%
M4
 
4.8%
C3
 
3.6%
Other values (9)16
19.3%

DEXAME
Categorical

HIGH CARDINALITY

Distinct153
Distinct (%)60.2%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
2017-08-11
 
10
2017-01-27
 
6
2017-02-16
 
4
2017-03-29
 
4
2017-01-11
 
3
Other values (148)
227 

Length

Max length10
Median length10
Mean length9.952755906
Min length4

Characters and Unicode

Total characters2528
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique87 ?
Unique (%)34.3%

Sample

1st row2017-01-05
2nd row2017-01-05
3rd row2017-01-11
4th row2017-01-11
5th row2017-01-11

Common Values

ValueCountFrequency (%)
2017-08-1110
 
3.9%
2017-01-276
 
2.4%
2017-02-164
 
1.6%
2017-03-294
 
1.6%
2017-01-113
 
1.2%
2017-12-013
 
1.2%
2017-09-253
 
1.2%
2017-12-183
 
1.2%
2017-07-283
 
1.2%
2017-09-063
 
1.2%
Other values (143)212
83.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2017-08-1110
 
3.9%
2017-01-276
 
2.4%
2017-03-294
 
1.6%
2017-02-164
 
1.6%
2017-07-063
 
1.2%
2017-08-183
 
1.2%
2017-12-013
 
1.2%
2017-12-183
 
1.2%
2017-04-073
 
1.2%
2017-01-113
 
1.2%
Other values (143)212
83.5%

Most occurring characters

ValueCountFrequency (%)
0552
21.8%
-504
19.9%
1478
18.9%
2405
16.0%
7299
11.8%
365
 
2.6%
859
 
2.3%
442
 
1.7%
640
 
1.6%
940
 
1.6%
Other values (5)44
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2016
79.7%
Dash Punctuation504
 
19.9%
Lowercase Letter6
 
0.2%
Uppercase Letter2
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0552
27.4%
1478
23.7%
2405
20.1%
7299
14.8%
365
 
3.2%
859
 
2.9%
442
 
2.1%
640
 
2.0%
940
 
2.0%
536
 
1.8%
Lowercase Letter
ValueCountFrequency (%)
o2
33.3%
n2
33.3%
e2
33.3%
Dash Punctuation
ValueCountFrequency (%)
-504
100.0%
Uppercase Letter
ValueCountFrequency (%)
N2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2520
99.7%
Latin8
 
0.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0552
21.9%
-504
20.0%
1478
19.0%
2405
16.1%
7299
11.9%
365
 
2.6%
859
 
2.3%
442
 
1.7%
640
 
1.6%
940
 
1.6%
Latin
ValueCountFrequency (%)
N2
25.0%
o2
25.0%
n2
25.0%
e2
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2528
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0552
21.8%
-504
19.9%
1478
18.9%
2405
16.0%
7299
11.8%
365
 
2.6%
859
 
2.3%
442
 
1.7%
640
 
1.6%
940
 
1.6%
Other values (5)44
 
1.7%

RESULT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size16.5 KiB
1
190 
4
43 
2
 
19
 
2

Length

Max length1
Median length1
Mean length0.9921259843
Min length0

Characters and Unicode

Total characters252
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row4

Common Values

ValueCountFrequency (%)
1190
74.8%
443
 
16.9%
219
 
7.5%
2
 
0.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1190
75.4%
443
 
17.1%
219
 
7.5%

Most occurring characters

ValueCountFrequency (%)
1190
75.4%
443
 
17.1%
219
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number252
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1190
75.4%
443
 
17.1%
219
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
Common252
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1190
75.4%
443
 
17.1%
219
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII252
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1190
75.4%
443
 
17.1%
219
 
7.5%

PMM
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct46
Distinct (%)85.2%
Missing200
Missing (%)78.7%
Infinite0
Infinite (%)0.0%
Mean18011.22222
Minimum2
Maximum288000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.1 KiB

Quantile statistics

Minimum2
5-th percentile27.8
Q1201.25
median424
Q32800
95-th percentile120738
Maximum288000
Range287998
Interquartile range (IQR)2598.75

Descriptive statistics

Standard deviation51870.45262
Coefficient of variation (CV)2.879896321
Kurtosis15.43081644
Mean18011.22222
Median Absolute Deviation (MAD)398
Skewness3.791800502
Sum972606
Variance2690543855
MonotonicityNot monotonic
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
2104
 
1.6%
2004
 
1.6%
322
 
0.8%
4802
 
0.8%
1201
 
0.4%
101
 
0.4%
19601
 
0.4%
201
 
0.4%
24001
 
0.4%
2561
 
0.4%
Other values (36)36
 
14.2%
(Missing)200
78.7%
ValueCountFrequency (%)
21
 
0.4%
101
 
0.4%
201
 
0.4%
322
0.8%
431
 
0.4%
801
 
0.4%
1121
 
0.4%
1201
 
0.4%
1601
 
0.4%
2004
1.6%
ValueCountFrequency (%)
2880001
0.4%
1815001
0.4%
1406801
0.4%
1100001
0.4%
1000301
0.4%
242401
0.4%
218001
0.4%
200051
0.4%
200001
0.4%
133601
0.4%

PCRUZ
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
192 
2
 
16
4
 
15
1
 
14
5
 
6
Other values (2)
 
11

Length

Max length1
Median length0
Mean length0.2440944882
Min length0

Characters and Unicode

Total characters62
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row3

Common Values

ValueCountFrequency (%)
192
75.6%
216
 
6.3%
415
 
5.9%
114
 
5.5%
56
 
2.4%
36
 
2.4%
65
 
2.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
216
25.8%
415
24.2%
114
22.6%
56
 
9.7%
36
 
9.7%
65
 
8.1%

Most occurring characters

ValueCountFrequency (%)
216
25.8%
415
24.2%
114
22.6%
36
 
9.7%
56
 
9.7%
65
 
8.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number62
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
216
25.8%
415
24.2%
114
22.6%
36
 
9.7%
56
 
9.7%
65
 
8.1%

Most occurring scripts

ValueCountFrequency (%)
Common62
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
216
25.8%
415
24.2%
114
22.6%
36
 
9.7%
56
 
9.7%
65
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII62
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
216
25.8%
415
24.2%
114
22.6%
36
 
9.7%
56
 
9.7%
65
 
8.1%

TRA_ESQUEM
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size15.7 KiB
193 
1
34 
99
 
12
11
 
9
12
 
4
Other values (2)
 
2

Length

Max length2
Median length0
Mean length0.3385826772
Min length0

Characters and Unicode

Total characters86
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.8%

Sample

1st row
2nd row
3rd row
4th row
5th row99

Common Values

ValueCountFrequency (%)
193
76.0%
134
 
13.4%
9912
 
4.7%
119
 
3.5%
124
 
1.6%
51
 
0.4%
21
 
0.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
134
55.7%
9912
 
19.7%
119
 
14.8%
124
 
6.6%
51
 
1.6%
21
 
1.6%

Most occurring characters

ValueCountFrequency (%)
156
65.1%
924
27.9%
25
 
5.8%
51
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number86
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
156
65.1%
924
27.9%
25
 
5.8%
51
 
1.2%

Most occurring scripts

ValueCountFrequency (%)
Common86
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
156
65.1%
924
27.9%
25
 
5.8%
51
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII86
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
156
65.1%
924
27.9%
25
 
5.8%
51
 
1.2%

DSTRAESQUE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size15.5 KiB
243 
ARTESUNATO+CLINDAMICINA
 
2
ARTESUNATO + CLINDAMICINA
 
1
CLOROQUIN PA 3 D PRIMAQUINA
 
1
PRIMAQUINA DIFOSFATO 15MG
 
1
Other values (6)
 
6

Length

Max length30
Median length0
Mean length1.12992126
Min length0

Characters and Unicode

Total characters287
Distinct characters27
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)3.5%

Sample

1st row
2nd row
3rd row
4th row
5th rowCLOROQ 3 DIAS PRIMAQ 14 DIAS

Common Values

ValueCountFrequency (%)
243
95.7%
ARTESUNATO+CLINDAMICINA2
 
0.8%
ARTESUNATO + CLINDAMICINA1
 
0.4%
CLOROQUIN PA 3 D PRIMAQUINA1
 
0.4%
PRIMAQUINA DIFOSFATO 15MG1
 
0.4%
3DIAS CLOROQ,10 DIAS PRIMAQ1
 
0.4%
10 CP CLOROQUINA+14 CP PRIMAQU1
 
0.4%
CLOROQ 3 DIAS PRIMAQ 14 DIAS1
 
0.4%
CLINDAMICINA + ARTESUNATO1
 
0.4%
ARTEMETHER+LUM E PRIMAQ1
 
0.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
primaq3
 
7.9%
dias3
 
7.9%
cp2
 
5.3%
artesunato+clindamicina2
 
5.3%
2
 
5.3%
primaqu2
 
5.3%
32
 
5.3%
clindamicina2
 
5.3%
primaquina2
 
5.3%
artesunato2
 
5.3%
Other values (16)16
42.1%

Most occurring characters

ValueCountFrequency (%)
A34
 
11.8%
I29
 
10.1%
28
 
9.8%
O18
 
6.3%
R18
 
6.3%
C17
 
5.9%
N17
 
5.9%
M16
 
5.6%
Q12
 
4.2%
U12
 
4.2%
Other values (17)86
30.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter235
81.9%
Space Separator28
 
9.8%
Decimal Number17
 
5.9%
Math Symbol6
 
2.1%
Other Punctuation1
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A34
14.5%
I29
12.3%
O18
 
7.7%
R18
 
7.7%
C17
 
7.2%
N17
 
7.2%
M16
 
6.8%
Q12
 
5.1%
U12
 
5.1%
T11
 
4.7%
Other values (8)51
21.7%
Decimal Number
ValueCountFrequency (%)
16
35.3%
04
23.5%
33
17.6%
42
 
11.8%
21
 
5.9%
51
 
5.9%
Space Separator
ValueCountFrequency (%)
28
100.0%
Other Punctuation
ValueCountFrequency (%)
,1
100.0%
Math Symbol
ValueCountFrequency (%)
+6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin235
81.9%
Common52
 
18.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A34
14.5%
I29
12.3%
O18
 
7.7%
R18
 
7.7%
C17
 
7.2%
N17
 
7.2%
M16
 
6.8%
Q12
 
5.1%
U12
 
5.1%
T11
 
4.7%
Other values (8)51
21.7%
Common
ValueCountFrequency (%)
28
53.8%
16
 
11.5%
+6
 
11.5%
04
 
7.7%
33
 
5.8%
42
 
3.8%
,1
 
1.9%
21
 
1.9%
51
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII287
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A34
 
11.8%
I29
 
10.1%
28
 
9.8%
O18
 
6.3%
R18
 
6.3%
C17
 
5.9%
N17
 
5.9%
M16
 
5.6%
Q12
 
4.2%
U12
 
4.2%
Other values (17)86
30.0%

DTRATA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct50
Distinct (%)19.7%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
None
192 
2017-01-27
 
4
2017-07-04
 
3
2017-02-16
 
3
2017-05-12
 
2
Other values (45)
50 

Length

Max length10
Median length4
Mean length5.464566929
Min length4

Characters and Unicode

Total characters1388
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40 ?
Unique (%)15.7%

Sample

1st rowNone
2nd rowNone
3rd rowNone
4th rowNone
5th row2017-01-11

Common Values

ValueCountFrequency (%)
None192
75.6%
2017-01-274
 
1.6%
2017-07-043
 
1.2%
2017-02-163
 
1.2%
2017-05-122
 
0.8%
2017-11-092
 
0.8%
2017-04-122
 
0.8%
2017-09-062
 
0.8%
2017-07-282
 
0.8%
2017-10-272
 
0.8%
Other values (40)40
 
15.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none192
75.6%
2017-01-274
 
1.6%
2017-07-043
 
1.2%
2017-02-163
 
1.2%
2017-11-092
 
0.8%
2017-07-282
 
0.8%
2017-10-272
 
0.8%
2017-09-062
 
0.8%
2017-05-122
 
0.8%
2017-04-122
 
0.8%
Other values (40)40
 
15.7%

Most occurring characters

ValueCountFrequency (%)
N192
13.8%
o192
13.8%
n192
13.8%
e192
13.8%
0139
10.0%
-124
8.9%
1116
8.4%
299
7.1%
779
5.7%
317
 
1.2%
Other values (5)46
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter576
41.5%
Decimal Number496
35.7%
Uppercase Letter192
 
13.8%
Dash Punctuation124
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0139
28.0%
1116
23.4%
299
20.0%
779
15.9%
317
 
3.4%
613
 
2.6%
413
 
2.6%
89
 
1.8%
96
 
1.2%
55
 
1.0%
Lowercase Letter
ValueCountFrequency (%)
o192
33.3%
n192
33.3%
e192
33.3%
Uppercase Letter
ValueCountFrequency (%)
N192
100.0%
Dash Punctuation
ValueCountFrequency (%)
-124
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin768
55.3%
Common620
44.7%

Most frequent character per script

Common
ValueCountFrequency (%)
0139
22.4%
-124
20.0%
1116
18.7%
299
16.0%
779
12.7%
317
 
2.7%
613
 
2.1%
413
 
2.1%
89
 
1.5%
96
 
1.0%
Latin
ValueCountFrequency (%)
N192
25.0%
o192
25.0%
n192
25.0%
e192
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1388
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N192
13.8%
o192
13.8%
n192
13.8%
e192
13.8%
0139
10.0%
-124
8.9%
1116
8.4%
299
7.1%
779
5.7%
317
 
1.2%
Other values (5)46
 
3.3%

DT_ENCERRA
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing254
Missing (%)100.0%
Memory size2.1 KiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
02B542017-01-0520170120173333045530059922017-01-042017011990-12-284026M6108333304551NaT2236052101102017-01-051NaNNoneNaT
12B542017-01-0520170120173333045578593412016-12-242016511962-12-084054F5108333304551NaT2101102017-01-051NaNNoneNaT
22B542017-01-1120170220173333045522883382017-01-062017011982-09-014034F5908333304551NaT2101102017-01-111NaNNoneNaT
32B542017-01-1120170220173333045522883382016-12-302016521956-11-274060M6408333304551NaT2101102017-01-111NaNNoneNaT
42B542017-01-1120170220173333045522702772017-01-102017021987-11-094029M6405333304551NaT151121382017-01-114310.0399CLOROQ 3 DIAS PRIMAQ 14 DIAS2017-01-11NaT
52B542017-01-1220170220173333045522883382017-01-122017021974-11-014042M6908333304551NaT2153052101102017-01-121NaNNoneNaT
62B542017-01-1320170220173333045522883382016-12-232016511966-03-254050M6106333304551NaT2101102017-01-131NaNNoneNaT
72B542017-01-1320170220173333045522883382017-01-082017021969-10-184047M6208333304551NaT1101121772017-01-13220000.05122017-01-13NaT
82B542017-01-1420170220173333045522883382017-01-132017021969-04-224047M6909333304501NaT2101102017-01-141NaNNoneNaT
92B542017-01-1620170320173333045522702772017-01-102017021987-11-094029M6405333304551NaT253202017-01-161NaNNoneNaT

Last rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
2442B542017-12-2120175120173333034022727842017-12-052017491951-11-214066M6109333303401NaT211102017-12-211NaNNoneNaT
2452B542017-12-2220175120173333045522883382017-10-032017401985-05-044032M6408333304551NaT71140515112AP1160053CUPI2017-12-2242280.0412017-12-23NaT
2462B542017-12-2420175220173333045522883382017-12-232017511984-01-194033F5406333304551NaT2101102017-12-241NaNNoneNaT
2472B542017-12-2720175220173333045522883382017-12-252017521986-03-114031M6909333304551NaT2101102017-12-271NaNNoneNaT
2482B542017-12-2720175220173333045522883382017-12-122017502009-06-034008F6909333304551NaT2101202017-12-271NaNNoneNaT
2492B542017-12-2820175220173333045530462812017-12-122017501983-05-054034M6106333304551NaT2103102017-12-281NaNNoneNaT
2502B542017-12-2820175220173333045522883382017-12-272017521993-11-084024M6909333301701NaT2101102017-12-281NaNNoneNaT
2512B542017-12-2820175220173333045554628862017-12-212017511983-11-264034M6108333301201NaT11021231MALA2017-12-28243.0199ARTEMETHER+LUM E PRIMAQ2017-12-28NaT
2522B542017-12-2920175220173333045530059922017-12-282017521949-04-204068M6108333304551NaT9999932102102017-12-291NaNNoneNaT
2532B542017-12-3120180120173333045530059922017-12-312018011983-06-084034F5408333304551NaT2211052101102017-12-311NaNNoneNaT